cur decomposition
Debiasing Random Oblique Projections for Subsampled OLS and Fast CUR in High Dimensions
Niu, Chengmei, Garg, Sachin, Dereziński, Michał, Liao, Zhenyu
Random sampling is a fundamental tool in modern machine learning and numerical linear algebra for reducing the computational cost of large-scale matrix problems. Existing analyses, however, rely primarily on subspace embedding guarantees, which do not precisely characterize the statistical bias of nonlinear random oblique projections induced by sampling, which arises ubiquitously in subsampled least squares and fast low-rank approximation methods. Because (pseudo)inversion is nonlinear, these random oblique projections can be systematically biased even when the underlying sketch is unbiased, thereby introducing hidden bias into downstream least squares and low-rank approximation solutions. In this work, we develop a unified non-asymptotic theory for random oblique projections in high dimensions. We show that standard random sampling schemes generally induce a systematic statistical bias overlooked by classical subspace embedding-style analyses, and we propose a principled debiasing framework to correct it. We illustrate the power of the theory through two canonical applications. For subsampled least squares, we obtain sharp bias--variance characterizations, reveal previously unrecognized statistical suboptimality in widely used sampling schemes, and identify when debiasing yields provable improvements. For fast CUR decomposition, we develop a debiased approach with improved approximation accuracy. Numerical experiments further validate our theoretical findings.
Property Inheritance for Subtensors in Tensor Train Decompositions
Tensor dimensionality reduction is one of the fundamental tools for modern data science. To address the high computational overhead, fiber-wise sampled subtensors that preserve the original tensor rank are often used in designing efficient and scalable tensor dimensionality reduction. However, the theory of property inheritance for subtensors is still underdevelopment, that is, how the essential properties of the original tensor will be passed to its subtensors. This paper theoretically studies the property inheritance of the two key tensor properties, namely incoherence and condition number, under the tensor train setting. We also show how tensor train rank is preserved through fiber-wise sampling. The key parameters introduced in theorems are numerically evaluated under various settings. The results show that the properties of interest can be well preserved to the subtensors formed via fiber-wise sampling. Overall, this paper provides several handy analytic tools for developing efficient tensor analysis
CURing Large Models: Compression via CUR Decomposition
Park, Sanghyeon, Moon, Soo-Mook
Large deep learning models have achieved remarkable success but are resource-intensive, posing challenges such as memory usage. We introduce CURing, a novel model compression method based on CUR matrix decomposition, which approximates weight matrices as the product of selected columns (C) and rows (R), and a small linking matrix (U). We apply this decomposition to weights chosen based on the combined influence of their magnitudes and activations. By identifying and retaining informative rows and columns, CURing significantly reduces model size with minimal performance loss. For example, it reduces Llama3.1-8B's parameters to 7.32B (-9%) in just 129 seconds, over 20 times faster than prior compression methods.
Column and row subset selection using nuclear scores: algorithms and theory for Nystr\"{o}m approximation, CUR decomposition, and graph Laplacian reduction
Fornace, Mark, Lindsey, Michael
Column selection is an essential tool for structure-preserving low-rank approximation, with wide-ranging applications across many fields, such as data science, machine learning, and theoretical chemistry. In this work, we develop unified methodologies for fast, efficient, and theoretically guaranteed column selection. First we derive and implement a sparsity-exploiting deterministic algorithm applicable to tasks including kernel approximation and CUR decomposition. Next, we develop a matrix-free formalism relying on a randomization scheme satisfying guaranteed concentration bounds, applying this construction both to CUR decomposition and to the approximation of matrix functions of graph Laplacians. Importantly, the randomization is only relevant for the computation of the scores that we use for column selection, not the selection itself given these scores. For both deterministic and matrix-free algorithms, we bound the performance favorably relative to the expected performance of determinantal point process (DPP) sampling and, in select scenarios, that of exactly optimal subset selection. The general case requires new analysis of the DPP expectation. Finally, we demonstrate strong real-world performance of our algorithms on a diverse set of example approximation tasks.
On the Robustness of Cross-Concentrated Sampling for Matrix Completion
Cai, HanQin, Huang, Longxiu, Kundu, Chandra, Su, Bowen
Matrix completion is one of the crucial tools in modern data science research. Recently, a novel sampling model for matrix completion coined cross-concentrated sampling (CCS) has caught much attention. However, the robustness of the CCS model against sparse outliers remains unclear in the existing studies. In this paper, we aim to answer this question by exploring a novel Robust CCS Completion problem. A highly efficient non-convex iterative algorithm, dubbed Robust CUR Completion (RCURC), is proposed. The empirical performance of the proposed algorithm, in terms of both efficiency and robustness, is verified in synthetic and real datasets.
Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption
Cai, HanQin, Chao, Zehan, Huang, Longxiu, Needell, Deanna
We study the tensor robust principal component analysis (TRPCA) problem, a tensorial extension of matrix robust principal component analysis (RPCA), that aims to split the given tensor into an underlying low-rank component and a sparse outlier component. This work proposes a fast algorithm, called Robust Tensor CUR Decompositions (RTCUR), for large-scale non-convex TRPCA problems under the Tucker rank setting. RTCUR is developed within a framework of alternating projections that projects between the set of low-rank tensors and the set of sparse tensors. We utilize the recently developed tensor CUR decomposition to substantially reduce the computational complexity in each projection. In addition, we develop four variants of RTCUR for different application settings. We demonstrate the effectiveness and computational advantages of RTCUR against state-of-the-art methods on both synthetic and real-world datasets.
Randomized Dimension Reduction with Statistical Guarantees
Large models and enormous data are essential driving forces of the unprecedented successes achieved by modern algorithms, especially in scientific computing and machine learning. Nevertheless, the growing dimensionality and model complexity, as well as the non-negligible workload of data pre-processing, also bring formidable costs to such successes in both computation and data aggregation. As the deceleration of Moore's Law slackens the cost reduction of computation from the hardware level, fast heuristics for expensive classical routines and efficient algorithms for exploiting limited data are increasingly indispensable for pushing the limit of algorithm potency. This thesis explores some of such algorithms for fast execution and efficient data utilization. From the computational efficiency perspective, we design and analyze fast randomized low-rank decomposition algorithms for large matrices based on "matrix sketching", which can be regarded as a dimension reduction strategy in the data space. These include the randomized pivoting-based interpolative and CUR decomposition discussed in Chapter 2 and the randomized subspace approximations discussed in Chapter 3. From the sample efficiency perspective, we focus on learning algorithms with various incorporations of data augmentation that improve generalization and distributional robustness provably. Specifically, Chapter 4 presents a sample complexity analysis for data augmentation consistency regularization where we view sample efficiency from the lens of dimension reduction in the function space. Then in Chapter 5, we introduce an adaptively weighted data augmentation consistency regularization algorithm for distributionally robust optimization with applications in medical image segmentation.
Matrix Completion with Cross-Concentrated Sampling: Bridging Uniform Sampling and CUR Sampling
Cai, HanQin, Huang, Longxiu, Li, Pengyu, Needell, Deanna
While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform sampling and CUR sampling, CCS provides extra flexibility that can potentially save sampling costs in applications. In addition, we also provide a sufficient condition for CCS-based matrix completion. Moreover, we propose a highly efficient non-convex algorithm, termed Iterative CUR Completion (ICURC), for the proposed CCS model. Numerical experiments verify the empirical advantages of CCS and ICURC against uniform sampling and its baseline algorithms, on both synthetic and real-world datasets.
Non-convex approaches for low-rank tensor completion under tubal sampling
Tan, Zheng, Huang, Longxiu, Cai, HanQin, Lou, Yifei
Tensor completion is an important problem in modern data analysis. In this work, we investigate a specific sampling strategy, referred to as tubal sampling. We propose two novel non-convex tensor completion frameworks that are easy to implement, named tensor $L_1$-$L_2$ (TL12) and tensor completion via CUR (TCCUR). We test the efficiency of both methods on synthetic data and a color image inpainting problem. Empirical results reveal a trade-off between the accuracy and time efficiency of these two methods in a low sampling ratio. Each of them outperforms some classical completion methods in at least one aspect.
Co-Separable Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a popular model in the field of pattern recognition. It aims to find a low rank approximation for nonnegative data M by a product of two nonnegative matrices W and H. In general, NMF is NP-hard to solve while it can be solved efficiently under separability assumption, which requires the columns of factor matrix are equal to columns of the input matrix. In this paper, we generalize separability assumption based on 3-factor NMF M=P_1SP_2, and require that S is a sub-matrix of the input matrix. We refer to this NMF as a Co-Separable NMF (CoS-NMF). We discuss some mathematics properties of CoS-NMF, and present the relationships with other related matrix factorizations such as CUR decomposition, generalized separable NMF(GS-NMF), and bi-orthogonal tri-factorization (BiOR-NM3F). An optimization model for CoS-NMF is proposed and alternated fast gradient method is employed to solve the model. Numerical experiments on synthetic datasets, document datasets and facial databases are conducted to verify the effectiveness of our CoS-NMF model. Compared to state-of-the-art methods, CoS-NMF model performs very well in co-clustering task, and preserves a good approximation to the input data matrix as well.